A Generalization Error for Q-Learning

نویسنده

  • Susan A. Murphy
چکیده

Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A distinct numerical approach for the solution of some kind of initial value problem involving nonlinear q-fractional differential equations

The fractional calculus deals with the generalization of integration and differentiation of integer order to those ones of any order. The q-fractional differential equation usually describe the physical process imposed on the time scale set Tq. In this paper, we first propose a difference formula for discretizing the fractional q-derivative  of Caputo type with order  and scale index . We es...

متن کامل

Does generalization performance of lq regularization learning depend on q? A negative example

l-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a l estimator differs in varying choices of the regularization order q. In particular, l leads to the LASSO estimate, while l corres...

متن کامل

Attentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm

In the last few years, reinforcement learning algorithms have been proposed as a more natural way of modelling animal learning. Unlike supervised learning methods, reinforcement learning addresses the basic problem faced by an animal when trying to control a discrete stochastic dynamic system: discover by trial and error a policy of actions that maximises some criterium of optimality, usually e...

متن کامل

Learning dynamics of simple perceptrons with non-extensive cost functions.

A Tsallis-statistics-based generalization of the gradient descent dynamics (using non- extensive cost functions), recently introduced by one of us, is proposed as a learning rule in a simple perceptron. The resulting Langevin equations are solved numerically for different values of an index q (q = 1 and q ≠ 1 respectively correspond to the extensive and non-extensive cases) and for different co...

متن کامل

Connectionist Q-learning in Robot Control Task

The Q-Learning algorithm suggested by Watkins in 1989 [1] belongs to a group of reinforcement learning algorithms. Reinforcement learning in robot control tasks has the form of multi-step procedure of adaptation. The main feature of that technique is that in the process of learning the system is not shown how to act in a specific situation. Instead, learning develops by trial and error using re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of machine learning research : JMLR

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2005